Estimating Proportional Hazards Model Using Frequentist and Bayesian Approaches

In statistics, the proportional hazards model (PHM) is one of a class of survival models. This model estimates the effects of different covariates influencing the time-to-event data in which the hazard function has been assumed to be the product of the baseline hazard function and a non-negative function of covariates. In this study, we investigate the hazard function, also known as the risk function or intensity function, which is employed in modelling the survival data and waiting times. The model parameters can be estimated via frequentist or Bayesian approach. However, the Bayesian approach is well known to have the advantages over frequentist methods when the data are small in size and involve censored individuals. In this paper, the PHM for right-censored data from Bayesian perspective will be discussed and the Markov Chain Monte Carlo (MCMC) method will be used to estimate the posterior distributions of the model parameters using Leukemia data. | Proportional hazards model | Hazard function | Bayesian | MCMC | ® 2012 Ibnu Sina Institute. All rights reserved. http://dx.doi.org/10.11113/mjfas.v8n2.126


INTRODUCTION
Modelling and survival data analysis is one of the oldest fields in statistics.Generally, survival data deals with lifetimes from some initial event at time zero to some terminal event of interest and basically the data would be an independent non-negative random variable, say T. A major advancement in survival analysis took place in the era of 1950's where Kaplan and Meier (1958) proposed their famous estimator of the survival curve [1].Later in 1972, David Cox [2] introduced the proportional hazards model which incorporates covariates.
An important issue that always arises when studying time to event data is that some individuals are still alive or survive at the end of the study.This implies that the event of interest has not occurred and therefore results in the emergence of right censored data.The censoring process is called non-informative if the risk set at any time point for those who are still alive and uncensored, should be representative of the entire population within the same cohort.Non-informative censoring automatically occurs if the censoring process is independent of the survival time.
General classes of semi-parametric hazards regression model for survival data which include Cox proportional hazards model, the accelerated failure time model and the accelerated hazards model have been proposed by Chen and Jewell [3].Their new models are flexible and may yield more accurate prediction of an individual's survival process and a covariate's effect can beidentified by separating two components namely a time scale change on hazard progression and a relative hazards ratio.
Bayesian approach might be a convenient approximation and simplification to bypass the prior distribution of the nuisance parameters since the maximum likelihood method as a general technique might fail when there is a presence of many nuisance parameters.
Recently, semi-parametric Bayesian analyses of proportional hazards model has become computationally feasible due to modern technology and advancement in computing techniques such as the Gibbs sampler and other MCMC methods.Arjas and Gasbarra [4] considered simple right censored data with a common unknown hazard rate in which the hazard rate is modelled nonparametrically.In their study, the sample paths of the hazard rate were generated from the posterior distribution using Gibbs Sampler.
Sinha and Dey [5] provided an overview of Bayesian semiparametric methods for the Cox model.One of the advantages of using Bayesian method is that we can join the baseline hazard and the regression coefficients that can be used accurately to compute the target posterior quantities using MCMC simulation techniques.They investigated the potential of Bayes methods for the analysis of survival data using semiparametric models based on either the hazard or the intensity function.The nonparametric part of every model is assumed to be a realization of a stochastic process.The parametric part, which may include a regression parameter or a parameter quantifying the heterogeneity of a population, is assumed to have a prior distribution with possibly unknown hyperparameters.
Ibrahim and Sinha [6] developed a class of semiparametric informative prior distributions for the Cox model.They specified a non-parametric prior for the baseline hazard rate and a parametric prior for the regression coefficients via the development of novel Markov chain Monte Carlo (MCMC) techniques for sampling from the posterior distribution of the parameters.Their approach seems to be a useful approach for this problem since it is difficult to specify meaningful prior distributions for the parameters in each model task and requiring contextual interpretations of a large number of parameters.
In this article, we have used the proportional hazards model that has been used extensively since 1972.In a fully parametric model, the lifetime distribution has been assumed to belong to a family of parametric distributions and reducing the regression problem to estimating the parameters from the data.
Proportional hazards model can be modelled from classical perspective by obtaining the partial likelihood approach in estimating the unknown parameters.Recently, the model has been the most common from a Bayesian perspective.It has been widely used in survival analysis for such realistic models.Fully Bayesian computations of multi-level or hierarchical model are now possible using simulation techniques.Gibbs Sampling is one of the new numerical algorithms which allow obtaining samples from posterior of interest and this new development has motivated the use of Bayesian methods in survival analysis.Gibbs Sampler is one of the best known Markov Chain Monte Carlo (MCMC) sampling in computational literature.Many authors appeared explaining in their papers the MCMC-based Bayesian analysis as this task would typically require significantly scientific programming ability and facility with such random number generators.Specialized software packages called BUGS [7] are created for implementing MCMC-based analyses of full probability models.These packages will treat all unknowns as random variables.This paper describes the use of freely available software for the analysis of complex statistical models using MCMC techniques, called WinBUGS [8] and R programming, in the context of PHM.
The proposed baseline hazards function in Cox model is slightly different compared to the original Cox model which has been proposed by Kalbfleisch [9].He proposed gamma process prior as the baseline hazard function in Cox model and he fixed the value of c (a specification of the weight attached to the guess) and r (guess at the failure rate per unit time).In this paper, we proposed both c and r to have their own distributions and to make the proposed model more flexible, we assume c and r to have gamma and uniform distributions, respectively.

The Weibull Proportional Hazards Model
A parametric survival model is one in which the survival time is assumed to follow a known distribution.Examples of distributions that are commonly used for survival time are the Weibull distribution, the Exponential distribution, the Log-logistic distribution, the Lognormal distribution and the generalized gamma distribution.
In parametric survival analysis, all parts of the model are specified for both the hazard function and the effect of any covariates.The estimation is easier and estimated survival curves are smoother as they draw information from the whole data.In addition, it is possible to do more sophisticated analyses with parametric models, such as introducing random effects or using Bayesian methodology to pool sources of information.
In parametric survival model, time T is assumed to follow some distribution whose probability density function, () can be expressed in terms of unknown parameters.The corresponding survival and hazard function can be determined once a probability density function is specified for survival time.
The Weibull model is the most widely used parametric survival model.Survival time T has a Weibull distribution of (|, ), with its survival function is given by (|, ) = exp (−  ) (2.1) and the probability density function is 2) where  is the scale parameter,  is the shape parameter, and both  and  are positive ( > 0,  > 0).The important feature of Weibull distribution is the failure rate.The failure rate of a subject is decreasing when  < 1, it is increasing when  > 1 and it is constant when  = 1, which indicates an exponential distribution.Therefore the hazard rate is defined as 4) The likelihood function in survival analysis is more complicated.The likelihood function is only related to the density function if there is no censored data; else the likelihood depends on the density function and the survival function.The likelihood function for Weibull distributed data,   ,  = 1,2, … ,  with random censoring (2.5) and the log-likehood function is (2.6) The Weibull hazard function is defined as ℎ(|, ) =  −1 while a Weibull PH model is then be defined by reparameterizing  as

The Cox Proportional Hazards Model
The Cox proportional hazards (PH) model is known as a semi-parametric model because the form of the baseline hazard is not specified, only the form of the effect of covariates.The Cox proportional hazards (PH) model can be written in terms of the hazard model formula shown below: ℎ(, ) = ℎ 0 ()exp(  ) (2.7)This model gives an expression for the hazard at time t for an individual with a given specification of a set of explanatory variables denoted by X where  = � 1 ,  2 , … ,   �.
An important feature of this formula is that the baseline hazard is a function of t, which concerns the proportional hazards (PH) assumption and does not involve the ′.It is shown that the exponential expression in contrast involves the ′ but does not involve t.The ′ here are called time-independent variables and are defined to be any variable whose value for a given individual does not change over time.
The Cox PH model has the property that if all of the ′ are zero then the formula will reduce to the baseline hazard function.This property of the Cox PH model is the reason why ℎ 0 () is called the baseline function.In other words, the Cox PH model will reduce to the baseline hazard function when there are no ′ in the model.
The survival function, the likelihood function and the log likelihood function for the Cox PH model are as follows: (, ) =  0 (, ) exp�  � (2.8) (2.9)

Maximum Likelihood Estimation (MLE)
The idea behind maximum likelihood parameter estimation is to determine the parameters that maximize the probability (likelihood) of the sample data.From a statistical point of view, the method of maximum likelihood is considered to be more robust (with some exceptions) and yields estimators with good statistical properties.In other words, MLE methods are more versatile and can be applied to most models and to different types of data.In addition, they provide efficient methods for quantifying uncertainty through confidence bounds.Although the methodology for maximum likelihood estimation is simple, the implementation is mathematically intense.Using today's computer power, however, mathematical complexity is not a big obstacle.The MLE methodology is presented next.
The maximum likelihood estimator (MLE) for the proportional hazards model with partly interval-censored data was proposed by Kim [11].The MLEs of the regression parameter and the cumulative hazard function shows consistency and asymptotically normal under an appropriate regularity conditions.
The Newton Raphson method is one of the popular methods in maximizing the likelihood function and estimating the unknown parameters.The Newton-Raphson method is deterministic because there is no element of randomness in optimisation.It is also an iterative procedure because it consists of a series of iterations, with an improved estimation in each of the iteration.
Generally, if we have a parameter vector  of dimension p and wish to estimate  � which maximize the likelihood function (), the algorithm is as follows: Go back to step (c) and repeat until converge.Some notations have been used are: •  () is the value of the parameters at the k th iteration of the routine • the score function is • the information matrix is • and   is the q th element of The partial likelihood is useful especially when it is appreciable simpler than the full likelihood, for example when it involves only the parameter of interest and not nuisance parameters [10].Later, Cox (1972) [2] gave the equation below as a likelihood for inference about .

A Bayesian MCMC Approach
The analysis of counting process in survival data is usually based on the modelling of the intensity function.The random variable T denote the survival time of an individual and hence, the survival curve is given by () = ( > ).
The baseline hazard rate  0 () is defined as the following limit, common to all subjects,  0 () = lim d→0 (≤<+d) d (3.1) by assuming that T is continuous and considering the calculation of the probability of an event happening over some finite time interval [,  + ).The baseline cumulative hazard rate can be defined as The processes   () can be observed for subjects  = 1,2, … , , that count the number of failures which have occurred up to time t.The counting process is constant and takes the value zero between failures and jumps one unit at each failure time.The new failure rate is then seen to be   () =   ()(|  ).The intensity function is characterized as the probability of an event of interest occurs in the small time interval, given that it has not happened before.
The corresponding intensity process   () is formulated as follows: 3) where   () is the increment of   over the small time interval[,  + ), and F t− represents the available data just before time t.Thus, the above equation can be written as   () = (  () |F t− ) (3.4)If subject i is observed to fail during this time interval,   () will take the value 1, otherwise   () is equal to zero.Hence (  () |F t− ) corresponds to the probability of subject i failing in the time interval [,  + ).As  approaches to zero (assume that time is continuous) this probability becomes the instantaneous hazard at time t for subject i.This is assumed to have the proportional hazards form as follows: () =   () 0 ()exp(  ) (3.5) where   () is an observed process and take the value 1 or 0 according to whether or not subject i is observed at time t.As we can see that  0 ()exp(  ) is the familiar Cox regression model.Thus we have observed data  =   (),   (),   ;  = 1,2, … ,  and unknown parameters  and Λ 0 () = ∫  0 ()  0 , the latter to be estimated nonparametrically.The joint posterior distribution for the above model can be defined as (, Λ 0 ( )|)~(|, Λ 0 ( ))()(Λ 0 ( )) Here, we need to specify for BUGS, the form of the likelihood (|, Λ 0 ( )) and prior distributions for  and Λ 0 ( ).Under non-informative censoring, the likelihood of the data is proportional to The counting process increments   () in the time interval [,  + ) are assumed to be independent Poisson random variables with means,   (), where   () ~ Poisson(  ())   () can be written as,   () =   ()exp(  )Λ 0 () (3.8)where Λ 0 () = Λ 0 () is the increment or jump in the integrated baseline hazard function occurring during the time interval [,  + ).
In this paper, the proposed baseline hazards function for Cox model is slightly different compared to the original Cox model which used gamma process prior as the baseline hazards function with both c and r are fixed values.Since the conjugate prior for the Poisson mean is the gamma distribution, it would be convenient if Λ 0 ( ) were a process in which the increments Λ 0 () are assumed to be the conjugate independent increments prior.
In this paper, we modified the increments to be the product of Λ 01 * () and Λ 02 * () in which we proposed the increments as Λ 0 () = Λ 01 * ()Λ 02 * () where the increments Λ 0 () is a product between Λ 01 * () and Λ 02 * ().Λ 01 * () is assume to have gamma distribution with hyperparameters mu and c as the shape and scale parameters, respectively and can be written as Λ 01 * () ~ Gamma(, ) Here we suggested that shape, mu and scale, c to have its own distribution in which mu has gamma distribution with mean 10 and variance 0.1 while c also has gamma distribution with mean 1 and variance 10.We set Λ 02 * () =  where r is a guess at the failure rate per unit time and has a uniform distribution, while dt is the size of the time interval.
This paper also proposed both c and r to have its own distributions rather than assuming the values to be arbitrary constant values.In order to make the proposed model more flexible, we assume c and r to have gamma and uniform distributions, respectively.The proposed full BUGS model is attached in the appendix.

RESULTS & DISCUSSION
We consider leukemia data where the effect of 6-MP (6-Mercaptopurine) therapy on the duration of remissions induced by adrenal corticosteroids has been studied as a model for testing of new agents [13].Randomly, patients in remission were assigned to maintenance therapy with either 6-MP or placebo.The median duration of 6-MP-maintained complete remissions was 33 weeks and for placebo, 9 weeks.While the study was in progress, a sequential experimental design was used in analyzing remission times.This resulted in the study being stopped after analysis of the remission times of 21 pairs of patients (42 patients).

An analysis of Maximum Likelihood Estimation
We use the leukemia data to estimate the unknown parameters in Weibull proportional hazards model using maximum likelihood estimation approach.Table 1 shows the MLE estimates of the hazard function, cumulative hazard function, density function and survivor function for Weibull proportional hazards model.
Newton Raphson method is the popular method in maximizing the likelihood function and estimating the unknown parameters.It takes six iterations to converge in the analysis and the results are tabulated in Table 1.The unknown parameter beta of Weibull proportional hazard model is 1.731 and the standard error is 0.413.The likelihood ratio test is 19.6, while the log likelihood is -106.58.Scale and shape parameters for Weibull proportional hazards model are 17.850 and 1.366, respectively.As can be seen, the p-value is very small and we can conclude that this is a significant value.
With the same data, we proceed to estimate the parameter for Cox regression model.The analysis involved the use of maximum likelihood estimation and maximum penalized partial likelihood estimation.Using Newton Raphson method as the maximum likelihood estimation, we obtain the estimated parameters and were tabulated in Table 2.The Newton Raphson iteration converges in four iterations and gives the estimated beta 1.659 with standard error 0.423 and log likelihood -93.599.With the same number of iterations using maximum penalized partial likelihood estimation, we obtained the estimated beta 1.650 with standard error 0.422 and log likelihood -86.869 and the results are also tabulated in Table 2.
As can be seen, the larger value of the log likelihood of maximum penalized likelihood estimation indicate better fitting models compared to the value of the log likelihood of maximum likelihood estimation using Newton Raphson method.This is because the partial likelihood is simpler than the full likelihood and it involves only the parameters of interest and not nuisance parameters.

An analysis of Bayesian modelling process
Using the same data, we continue our analysis with Bayesian approach using statistical packages WinBUGS and R. The choice of hyperparameters and initial values are not sensitive to the estimation of the parameters.Five chains with the same starting values were carried out simultaneously.10,000 iterations is performed for each chain after 1000 iterations for burn-in, and one out of every 100th values is used.  .Log likelihood for each chain has also tabulated in the same table.As we can see that the standard deviation and MCMC error reduced as we increased the number of chain.The smaller the standard deviation and MCMC error, the best results we get for the parameter estimation.The posterior distribution of the final sample after generating MCMC and an acceptable convergence to the stationary distribution for all hyperparameters of the model is shown in Figure 1.The generated observations of the trace plots are more convincing in terms of convergence, with all generated values within a parallel zone and no obvious tendencies or periodicities.The densities provide a graphical representation of the posterior densities estimate for each node.Figure 1 shows trace plots and posterior densities for 50,000 from each of five generated samples (10,000 for each chain).We analyze the data using WinBUGS and R programming to Cox regression model with gamma process prior as the baseline hazards function.Five chains with the same starting values were carried out simultaneously.After 1000 iterations for burn-in, 10,000 iterations is performed for each chain and one out of every 100th values is used.The choice of hyperparameters and initial values are not sensitive to the estimation of the parameters.
The posterior mean, standard deviation, MCMC error, and median with a 95% posterior credible interval and the deviance information criterion, DIC for original Cox regression model (Gamma process is used as a prior belief for the baseline hazard) and proposed baseline hazards function for Cox regression model for different number of chain are shown in Table 4.As can be seen in the Table 4, the proposed baseline hazards function for Cox regression model gives better fit compared to the original baseline hazard function using gamma process prior of Cox regression model.The deviance information criterions, DIC for original gamma process prior of Cox regression model in each chain are higher compared to the proposed baseline hazards function of Cox regression model.The lower values of the criterion indicate better fitting models and this value shows that the proposed baseline hazards function of Cox regression model gives better fit compared to the original gamma process prior of Cox regression model.We also compare the log likelihood for both Cox regression models and found that the log likelihood for original Cox regression model is higher compared to log likelihood for proposed Cox regression model in each chain.Figure 2 shows trace plots and posterior densities for 50,000 from each of five generated samples (10,000 for each chain).The densities provide a graphical representation of the posterior densities estimate for each node.

CONCLUSION
We proposed a Bayesian method to fit more flexible survival models for non-informative censored data.Using WinBUGS and R software, we proved that by using the proposed Cox regression model the DIC value is smaller than the Gamma process prior for the nonparametric part in the original Cox regression model.This proposed model is simple and alternative to the maximum likelihood method or the Gamma process prior.
WinBUGS is a tool for analyzing survival data in a Bayesian framework using Markov Chain Monte Carlo (MCMC).While in R, we use R2WinBUGS package to make it possible to call a BUGS model, summarize inferences and convergence in a table and graph, and save the simulations in arrays for easy access in R.
The results of the statistical analysis in the example are consistent with those obtained from previous analysis.Bayesian models can be compared using the deviance information criterion (DIC), which its posterior distributions have been obtained using MCMC.DIC has been implemented as a tool in the BUGS software package.Bayesian inference has several advantages particularly in the flexibility of model-building for complex data over the frequentist approaches.Bayesian approach enables us to make exact inference for any sample size based on the posterior distribution.

Fig. 1
Fig.1MCMC diagnostic plots for Weibull Proportional Hazards Regression model using Gibbs Sampling when generating five chains with 10,000 iterations in each chain.

( a )Fig. 2
Fig.2MCMC diagnostic plots for Cox Regression model (gamma process prior baseline hazard function and proposed baseline hazard function) using Gibbs Sampling when generating five chains with 10,000 iterations for each chain.

Table 1
Summaries of parameter estimations for Weibull Proportional hazards model using Maximum Likelihood Estimation (Newton Raphson Method)

Table 3
Posterior statistics mean for Weibull Proportional Hazards Regression Model with different number of chain (10,000 from each of generated samples).(a)Summaries of posterior statistics   =  � +   and the lower values of the criterion indicate better fitting models.Table3shows different values of posterior means for parameter estimation and different DIC depends on the number of chain.We know that  � = −2 × log likelihood thus the log likelihood =

Table 4
Posterior statistics mean with different number of chain (10,000 from each of generated samples).