How to implement a rescaled random-effects excess hazard regression model to handle situations involving inappropriate life tables.

library(xhaz)

Introduction

xhaz is an R package for fitting excess hazard regression models, with or without the assumption of proportional hazards. In the presence of hierarchical data, inter-cluster heterogeneity can affect excess hazard estimates. To address this issue, the mexhaz R package implements multilevel excess hazard regression models. If the user wants to check the validity of the non-comparability bias, an easy way is to use the mexhazLT function of the xhaz R package. The user can also specify whether the framework is consistent with classical excess hazard modeling, i.e., assuming that the expected mortality of the individuals studied is appropriate, using the pophaz argument is equivalent to classical. The user can also consider a different framework by specifying that pophaz equals rescaled: the expected mortality available in the life table is not accurate, i.e., there is a non-comparability source of bias with respect to the expected mortality of the study population Goungounga et al. (2023).

(Here’s the abstract from Goungounga et al. (2023) paper: In the presence of competing causes of event occurrence (e.g., death), the interest might not only be in the overall survival but also in the so-called net survival, that is, the hypothetical survival that would be observed if the disease under study were the only possible cause of death. Net survival estimation is commonly based on the excess hazard approach in which the hazard rate of individuals is assumed to be the sum of a disease-specific and expected hazard rate, supposed to be correctly approximated by the mortality rates obtained from general population life tables…

A related software package can be found at a gitlab webpage or at https://CRAN.R-project.org/package=xhaz.

Installation

The most recent version of xhaz can be installed directly from the cran repository using

install.packages("xhaz")

xhaz depends on the stats, survival, optimParallel, numDeriv, statmod, gtools and splines packages which can be installed directly from CRAN.

It also utilizes survexp.fr, the R package containing the French life table. For example, to install survexp.fr follow the instructions available at the RStudio page on R and survexp.fr.

First, install the R package via github.

devtools::install_github("rstudio/survexp.fr")

Then, when these other packages are installed, please load the xhaz R package.

library(xhaz)

Fitting an classical excess hazard regression model with a cubic B-splines for the baseline hazard as in mexhaz

For this illustration, we used a simulated dataset from the original paper by Goungounga et al. (2023). This dataset consists of 4,978 patients with information on their treatment arm and clinical center ID, as this information may affect the excess mortality rate. The US life table can be used to estimate the model parameters. mexhazLT() function will be used with argument (`pophaz = classic).

data("breast", package = "xhaz")

head(breast)
#>        temps statut      age     agecr       date SEX armt hosp dept
#> 3   5.050550      1 30.95628 -2.213589 1985-12-23   2    0    3   09
#> 8   1.415929      1 35.20680 -1.788538 1985-04-30   2    1    8   81
#> 11  1.904082      1 31.80505 -2.128713 1985-10-15   2    0   11   06
#> 16 15.000000      0 42.52067 -1.057150 1985-09-16   2    1   16   10
#> 17  2.504911      0 32.01450 -2.107767 1985-01-29   2    0   17   94
#> 18  5.319706      0 31.77941 -2.131276 1985-06-04   2    1   18   55
dim(breast)
#> [1] 4978    9

# The life table to be used is survexp.us. Note that SEX is coded 2 instead of female in survexp.us.

breast$sexe <- "female"
fit.haz <- exphaz(
               formula = Surv(temps, statut) ~ 1,
               data = breast, ratetable = survexp.us, 
               only_ehazard = FALSE,
               rmap = list(age = 'age', sex = 'sexe',
                           year = 'date'))



breast$expected <- fit.haz$ehazard
breast$expectedCum <- fit.haz$ehazardInt


qknots <- quantile(breast[breast$statut==1,]$temps, probs=c(1:2/3))
mod.bs <- mexhazLT(
  formula = Surv(temps, statut) ~ agecr + armt,
  data = breast, ratetable = survexp.us, degree = 3,
  knots = qknots, expected = "expected", 
  expectedCum = "expectedCum",
  base = "exp.bs", pophaz = "classic")
#> iteration = 0
#> Step:
#> [1] 0 0 0 0 0 0 0 0
#> Parameter:
#> [1] -1 -1 -1 -1 -1 -1  0  0
#> Function Value
#> [1] 11032
#> Gradient:
#> [1] -1281.60395  -490.49700  -810.60363  -112.90836    72.86473   103.05848
#> [7] -1994.09940  -415.61972
#> 
#> iteration = 39
#> Parameter:
#> [1] -1.5083019  0.8413747  0.5236963 -0.2330524 -0.3193139 -1.1142807  0.3759031
#> [8] -0.3239171
#> Function Value
#> [1] 9784.333
#> Gradient:
#> [1] -0.0003943571  0.0013787940 -0.0010131771 -0.0001036824 -0.0007712515
#> [6]  0.0007721412 -0.0000509317  0.0001582521
#> 
#> Gradient relatif proche de zéro.
#> L'itération courante est probablement la solution.
#> 
#> 
#> Data
#>  Name N.Obs.Tot N.Obs N.Events N.Clust
#>  data      4978  4978     4363       1
#> 
#> Details
#>  Iter Eval   Base Nb.Leg Nb.Aghq Optim Method Code    LogLik Total.Time
#>    39  289 exp.bs     20      10   nlm    ---    1 -9784.333       2.13
                  
mod.bs
#> Call:
#> mexhazLT(formula = Surv(temps, statut) ~ agecr + armt, data = breast, 
#>     expected = "expected", expectedCum = "expectedCum", pophaz = "classic", 
#>     base = "exp.bs", degree = 3, knots = qknots, ratetable = survexp.us)
#> 
#> Coefficients:
#>            Estimate    StdErr  z.value   p.value    
#> Intercept -1.508302  0.088648 -17.0145 < 2.2e-16 ***
#> BS3.1      0.841375  0.131143   6.4157 1.402e-10 ***
#> BS3.2      0.523696  0.090615   5.7793 7.499e-09 ***
#> BS3.3     -0.233052  0.188542  -1.2361    0.2164    
#> BS3.4     -0.319314  0.255005  -1.2522    0.2105    
#> BS3.5     -1.114281  0.248570  -4.4828 7.368e-06 ***
#> agecr      0.375903  0.013603  27.6334 < 2.2e-16 ***
#> armt      -0.323917  0.031372 -10.3249 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> log-likelihood: -9784.3328 (for 8 degree(s) of freedom)
#> 
#> number of observations: 4978, number of events: 4363

Fitting a rescaled excess hazard regression model with a cubic B-splines for the baseline hazard as in mexhaz

The new parameter to be added to mexhazLT() function is pophaz=“rescaled”: it allows to rescale the life table available for the estimation of the excess hazard parameters. This model concerns that proposed by Goungounga et al (2016).

mod.bs2 <- mexhazLT(
  formula = Surv(temps, statut) ~ agecr + armt,
  data = breast, ratetable = survexp.us, degree = 3,
  knots = qknots, expected = "expected",
  expectedCum = "expectedCum",
  base = "exp.bs", pophaz = "rescaled")
#> iteration = 0
#> Step:
#> [1] 0 0 0 0 0 0 0 0 0
#> Parameter:
#> [1] -1 -1 -1 -1 -1 -1  0  0  0
#> Function Value
#> [1] 11032
#> Gradient:
#> [1] -1281.60395  -490.49700  -810.60363  -112.90836    72.86473   103.05848
#> [7] -1994.09940  -415.61972  -140.82773
#> 
#> iteration = 51
#> Parameter:
#> [1] -1.5349736  0.8585781  0.5389037 -0.2782919 -0.3520908 -1.3452699  0.3591276
#> [8] -0.3373832  0.8067923
#> Function Value
#> [1] 9783.523
#> Gradient:
#> [1] -0.0025525541 -0.0004893081 -0.0006475602 -0.0002819434 -0.0002892193
#> [6] -0.0005962925 -0.0007585186 -0.0011386874 -0.0002582965
#> 
#> Gradient relatif proche de zéro.
#> L'itération courante est probablement la solution.
#> 
#> 
#> Data
#>  Name N.Obs.Tot N.Obs N.Events N.Clust
#>  data      4978  4978     4363       1
#> 
#> Details
#>  Iter Eval   Base Nb.Leg Nb.Aghq Optim Method Code    LogLik Total.Time
#>    51  401 exp.bs     20      10   nlm    ---    1 -9783.523       2.93
                  
mod.bs2
#> Call:
#> mexhazLT(formula = Surv(temps, statut) ~ agecr + armt, data = breast, 
#>     expected = "expected", expectedCum = "expectedCum", pophaz = "rescaled", 
#>     base = "exp.bs", degree = 3, knots = qknots, ratetable = survexp.us)
#> 
#> Coefficients:
#>             Estimate    StdErr  z.value   p.value    
#> Intercept  -1.534974  0.094208 -16.2935 < 2.2e-16 ***
#> BS3.1       0.858578  0.136119   6.3076 2.835e-10 ***
#> BS3.2       0.538904  0.094530   5.7009 1.192e-08 ***
#> BS3.3      -0.278292  0.202270  -1.3758   0.16887    
#> BS3.4      -0.352091  0.281798  -1.2494   0.21150    
#> BS3.5      -1.345270  0.342630  -3.9263 8.626e-05 ***
#> agecr       0.359128  0.019167  18.7363 < 2.2e-16 ***
#> armt       -0.337383  0.034268  -9.8455 < 2.2e-16 ***
#> log(Alpha)  0.806792  0.434871   1.8552   0.06356 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> log-likelihood: -9783.523 (for 9 degree(s) of freedom)
#> 
#> number of observations: 4978, number of events: 4363

Fitting a random-effects excess hazard regression model with a cubic B-splines for the baseline hazard as in mexhaz

As in mexhaz, the new parameter to be added to the mexhazLT() function is random=“hosp”: it allows to specify the variable indicating the cluster levels. However, pophaz is set to be equal to “classic”. This excess hazard regression model is the one proposed by Charvat et al (2016). In the output, loglik corresponds to the total log-likelihood including the sum of the cumulative expected hazards, which is often removed in classical excess hazard regression models because it is considered a nuisance parameter.


mod.bs3 <- mexhazLT(
  formula = Surv(temps, statut) ~ agecr + armt,
  data = breast, ratetable = survexp.us, degree = 3,
  knots = qknots, expected = "expected",
  expectedCum = "expectedCum",
  base = "exp.bs", pophaz = "classic", random = "hosp")
#> iteration = 0
#> Step:
#> [1] 0 0 0 0 0 0 0 0 0
#> Parameter:
#> [1] -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  0.0  0.0  0.1
#> Function Value
#> [1] 10129.43
#> Gradient:
#> [1]   -39.69633  -148.95592  -395.16138    18.72774   118.58665   119.00834
#> [7] -2896.28313   329.60901    39.47875
#> 
#> iteration = 53
#> Parameter:
#> [1] -1.90783127  1.00354112  1.11855695  0.90280950  0.65681302  0.05890742
#> [7]  0.49124655 -0.45036907 -0.20709299
#> Function Value
#> [1] 8998.729
#> Gradient:
#> [1]  0.0026219409  0.0022965273  0.0006163271  0.0006457412 -0.0012478267
#> [6]  0.0002801244 -0.0007330527 -0.0002019078  0.0019390427
#> 
#> Gradient relatif proche de zéro.
#> L'itération courante est probablement la solution.
#> 
#> Computation of the covariance matrix of the shrinkage estimators
#> 
#> Data
#>  Name N.Obs.Tot N.Obs N.Events N.Clust
#>  data      4978  4978     4363     100
#> 
#> Details
#>  Iter Eval   Base Nb.Leg Nb.Aghq Optim Method Code    LogLik Total.Time
#>    53  422 exp.bs     20      10   nlm    ---    1 -8998.729      17.36
                  
mod.bs3
#> Call:
#> mexhazLT(formula = Surv(temps, statut) ~ agecr + armt, data = breast, 
#>     expected = "expected", expectedCum = "expectedCum", pophaz = "classic", 
#>     base = "exp.bs", degree = 3, knots = qknots, random = "hosp", 
#>     ratetable = survexp.us)
#> 
#> Coefficients:
#>                 Estimate    StdErr  z.value   p.value    
#> Intercept      -1.907831  0.122002 -15.6376 < 2.2e-16 ***
#> BS3.1           1.003541  0.131641   7.6233 2.465e-14 ***
#> BS3.2           1.118557  0.093232  11.9976 < 2.2e-16 ***
#> BS3.3           0.902810  0.192646   4.6864 2.781e-06 ***
#> BS3.4           0.656813  0.257079   2.5549  0.010622 *  
#> BS3.5           0.058907  0.251562   0.2342  0.814855    
#> agecr           0.491247  0.014198  34.5990 < 2.2e-16 ***
#> armt           -0.450369  0.032262 -13.9596 < 2.2e-16 ***
#> hosp [log(sd)] -0.207093  0.074504  -2.7796  0.005442 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> log-likelihood: -8998.7292 (for 9 degree(s) of freedom)
#> 
#> number of observations: 4978, number of events: 4363

Fitting a rescaled random-effects excess hazard regression model with a cubic B-splines for the baseline hazard as in mexhaz

As in mexhaz, the new parameter to be added to the mexhazLT() function is random=“hosp”: it allows to specify the variable indicating the cluster levels. However, pophaz is set to be equal to “rescaled” (`pophaz = "rescaled"). This excess hazard regression model is the one proposed by Goungounga et al (2023).


mod.bs4 <- mexhazLT(
  formula = Surv(temps, statut) ~ agecr + armt,
  data = breast, ratetable = survexp.us, degree = 3,
  knots = qknots, expected = "expected", 
  expectedCum = "expectedCum",
  base = "exp.bs", pophaz = "rescaled", random = "hosp")
#> iteration = 0
#> Step:
#>  [1] 0 0 0 0 0 0 0 0 0 0
#> Parameter:
#>  [1] -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  0.0  0.0  0.0  0.1
#> Function Value
#> [1] 10129.43
#> Gradient:
#>  [1]   -39.69633  -148.95592  -395.16138    18.72774   118.58665   119.00834
#>  [7] -2896.28313   329.60901   -83.32170    39.47875
#> 
#> iteration = 63
#> Parameter:
#>  [1] -1.9735698  1.0336144  1.1548583  0.8953165  0.6287633 -0.1876885
#>  [7]  0.4726014 -0.4743774  1.0226142 -0.1557342
#> Function Value
#> [1] 8995.933
#> Gradient:
#>  [1]  3.686699e-06  1.091097e-04  5.260753e-04  3.165042e-04 -5.493348e-04
#>  [6]  2.619345e-04  1.091394e-04 -7.566996e-04 -4.126733e-04 -7.275958e-05
#> 
#> Gradient relatif proche de zéro.
#> L'itération courante est probablement la solution.
#> 
#> Computation of the covariance matrix of the shrinkage estimators
#> 
#> Data
#>  Name N.Obs.Tot N.Obs N.Events N.Clust
#>  data      4978  4978     4363     100
#> 
#> Details
#>  Iter Eval   Base Nb.Leg Nb.Aghq Optim Method Code    LogLik Total.Time
#>    63  553 exp.bs     20      10   nlm    ---    1 -8995.933      22.91
                  
mod.bs4
#> Call:
#> mexhazLT(formula = Surv(temps, statut) ~ agecr + armt, data = breast, 
#>     expected = "expected", expectedCum = "expectedCum", pophaz = "rescaled", 
#>     base = "exp.bs", degree = 3, knots = qknots, random = "hosp", 
#>     ratetable = survexp.us)
#> 
#> Coefficients:
#>                 Estimate    StdErr  z.value   p.value    
#> Intercept      -1.973570  0.131548 -15.0027 < 2.2e-16 ***
#> BS3.1           1.033614  0.138130   7.4829 7.261e-14 ***
#> BS3.2           1.154858  0.098864  11.6813 < 2.2e-16 ***
#> BS3.3           0.895317  0.207525   4.3143 1.601e-05 ***
#> BS3.4           0.628763  0.291275   2.1587 0.0308767 *  
#> BS3.5          -0.187689  0.317852  -0.5905 0.5548619    
#> agecr           0.472601  0.016678  28.3364 < 2.2e-16 ***
#> armt           -0.474377  0.035524 -13.3537 < 2.2e-16 ***
#> log(Alpha)      1.022614  0.273560   3.7382 0.0001854 ***
#> hosp [log(sd)] -0.155734  0.078446  -1.9853 0.0471164 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> log-likelihood: -8995.9329 (for 10 degree(s) of freedom)
#> 
#> number of observations: 4978, number of events: 4363

We can compare the output of these two models using AIC or BIC criteria.

compared_models <- list(mod.bs,mod.bs2, mod.bs3, mod.bs4)
names(compared_models) <- c("mod.bs","mod.bs2", "mod.bs3", "mod.bs4")

sapply(compared_models, function(i) {
  AIC(i)
})
#>   mod.bs  mod.bs2  mod.bs3  mod.bs4 
#> 19584.67 19585.05 18015.46 18011.87

A statistical comparison between two nested models can be performed with a likelihood ratio test calculated by function anova method implemented in xhaz.

For example, suppose we want to test whether we can drop the rescaling parameter between the different excess hazard regression models.

As in survival package, we compare the models using anova(), i.e.,

anova(mod.bs,mod.bs2)
#> Assumption: Model 1 nested within Model 2
#> 
#> Likelihood ratio test
#> Model 1: 
#> Surv(temps, statut) ~ agecr + armt
#> Model 2: 
#> Surv(temps, statut) ~ agecr + armt
#>      Model.df  loglik      df Chisq Pr(>Chisq)
#> [1,]      8.0 -9784.3      NA    NA         NA
#> [2,]      9.0 -9783.5     1.0  0.81     0.2031
anova(mod.bs3,mod.bs4)
#> Assumption: Model 1 nested within Model 2
#> 
#> Likelihood ratio test
#> Model 1: 
#> Surv(temps, statut) ~ agecr + armt
#> Model 2: 
#> Surv(temps, statut) ~ agecr + armt
#>      Model.df  loglik      df Chisq Pr(>Chisq)  
#> [1,]      9.0 -8998.7      NA    NA         NA  
#> [2,]     10.0 -8995.9     1.0 2.796    0.01804 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that it is the user’s responsibility to provide properly nested hazard models for the LRT to be valid. The results suggest that by correcting for between-cluster heterogeneity and non-comparability bias with a rescaled random effects excess hazard regression model, the user will be able to provide more accurate estimates of net survival.

Plot of net survival and excess hazard for different models

One could be interested in the prediction of net survival and excess hazard for the individual with the same characteristics as individual 1 in the breast dataset (age 30.95, armt equals 0) as performed in mexhaz.


predict_mod <- predict(mod.bs,
                        time.pts=seq(0.1,10,by=0.1),
                        data.val=data.frame(agecr = breast[1,]$agecr,
                                            armt = breast[1,]$armt))


predict_mod2 <- predict(mod.bs2,
                        time.pts=seq(0.1,10,by=0.1),
                        data.val=data.frame(agecr = breast[1,]$agecr,
                                            armt = breast[1,]$armt))

predict_mod3 <- predict(mod.bs3,
                        time.pts=seq(0.1,10,by=0.1),
                        data.val=data.frame(agecr = breast[1,]$agecr,
                                            armt = breast[1,]$armt, 
                                            hosp = breast[1,]$hosp))

predict_mod4 <- predict(mod.bs4,
                        time.pts=seq(0.1,10,by=0.1),
                        data.val=data.frame(agecr = breast[1,]$agecr,
                                            armt = breast[1,]$armt,
                                            hosp = breast[1,]$hosp))

old.par <- par(no.readonly = TRUE)

par(mfrow = c(1, 2))

plot(predict_mod$results$time.pts, predict_mod$results$hazard,
     type = "l", lwd = 2, xlab = 'Time (years)',
     ylab = "excess hazard")
lines(predict_mod2$results$time.pts, predict_mod2$results$hazard,
      type = "l", lwd = 2, col = "blue", lty = 2)
lines(predict_mod3$results$time.pts, predict_mod3$results$hazard, 
      type = "l", lwd = 2, col = "red")
lines(predict_mod4$results$time.pts, predict_mod4$results$hazard,
      type = "l", lwd = 2, col = "green", lty = 2)


plot(predict_mod$results$time.pts, predict_mod$results$surv,
     type = "l", lwd = 2, xlab = 'Time (years)',
     ylab = "Net survival", ylim = c(0,1))
lines(predict_mod2$results$time.pts, predict_mod2$results$surv,
      type = "l", lwd = 2, col = "blue", lty = 2)
lines(predict_mod3$results$time.pts, predict_mod3$results$surv,
      type = "l", lwd = 2, col = "red")
lines(predict_mod4$results$time.pts, predict_mod4$results$surv,
      type = "l", lwd = 2, col = "green",lty = 2)

legend(
  "topright",
  legend = c("mod.bs",
             "mod.bs2",
             "mod.bs3",
             "mod.bs4"),
  lty = c(1, 2 , 1, 2),
  lwd = c(2, 2 , 2, 2),
  col = c("black", "blue", "red", "green")
)

par(old.par)

License

GPL 3.0, for academic use.

Acknowledgments

We are grateful to the members of the CENSUR Survival Group for their helpful comments.