# Prediction Modeling with the Cox model - all about the baseline hazard

### Questions about the Cox model you were afraid to ask

Did you regularly ask yourself the following questions as, how is the baseline hazard function calculated and where can I use it for? What is the difference between `centered=TRUE`

and `centered=FALSE`

in the `basehaz`

function? What is the result of the `predict.coxph`

function and what is the difference between the options `type=c("lp", "risk", "expected", "terms", "survival")`

in this function? What is the difference between `predict.coxph`

and `survfit`

? Where can I use the `termplot`

function for and what is the difference between `terms=1`

and `terms=2`

in this function? These questions and more will be answered here!

### How is the baseline hazard function calculated and where can I use it for?

The Cox model can be defined as follows,

\[ h(t) = h_0(t) \times exp(b_1x_1 + b_2x_2 + ... + b_px_p) \]
You could say that it consists of two separate parts, \(h_0(t)\) and \(exp(b_1x_1 + b_2x_2 + ... + b_px_p)\). The first part is called the baseline hazard function and the second part consists of the coefficients and predictor values (also called linear predictor, LP in short). If you know both of these parts you can produce the same results as the `basehaz`

, `survfit`

, `predict.coxph`

and `termplot`

functions and get a better understanding of what these functions do.

the Cox model can be applied in R using the `coxph`

function. When you apply it you only get information about the \(exp(b_1x_1 + b_2x_2 + ... + b_px_p)\) part and not about the \(h_0(t)\) part as we will see when we fit a Cox model.

We generate a dataset with some example data that has no further meaning but is more used to compare results.

```
time <- c(1, 3, 5, 6, 2, 7, 9, 11)
status <- c(1, 0, 1, 1, 1, 0, 1, 1)
sex <- c(1, 1, 1, 1, 0, 0, 0, 0)
age <- c(57, 52, 48, 42, 39, 31, 26, 22)
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
summary(fit_cox)
```

```
## Call:
## coxph(formula = Surv(time, status) ~ age + sex, data = df, method = "breslow")
##
## n= 8, number of events= 6
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.6338168 1.8847908 0.3917450 1.618 0.106
## sex -7.4935992 0.0005566 5.1135367 -1.465 0.143
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.8847908 0.5306 8.746e-01 4.062
## sex 0.0005566 1796.5065 2.471e-08 12.538
##
## Concordance= 0.905 (se = 0.076 )
## Likelihood ratio test= 10.62 on 2 df, p=0.005
## Wald test = 2.72 on 2 df, p=0.3
## Score (logrank) test = 7 on 2 df, p=0.03
```

Information about the the \(h_0(t)\) part is missing. However, this part is needed to produce e.g. predictions or generate survival curves. This can automatically be done with functions as `survfit`

`predict.coxph`

and `termplot`

, but it is not always clear how they generate their results. That is exactly what this post will bring you hopefully, clarity about what these functions provide . To start from the beginning we will start with the `basehaz`

function.

**It all starts with the cumulative baseline hazard**

When you look at the R Documentation file of the `basehaz`

function (`?basehaz`

) you can read that this function is an Alias for the `survfit`

function (which means that either the `basehaz`

or `survfit`

function produce the same result. I will show that below) and you can read that `basehaz`

produces the cumulative hazard (function).

The cumulative *baseline* hazard function can be calculated in two ways, when you apply an empty Cox model (or by using the observed data, i.e. no model), `basehaz`

will give the same result as the Nelson-Aalen estimator and when we derive it from the Cox model including predictors `basehaz`

will use the Breslow estimator.

I will show you how to calculate the cumulative baseline hazard manually for observed data and an empty Cox model and subsequently the Breslow estimator for a Cox model including predictors.

**The Nelson-Aalen estimator**

The formula for the Nelson-Aalen estimator is,

\[\widetilde{H}(t)= \sum_{t_i\le t}\frac{d_i}{n_i}\]

where \(d_i\) are the number of events of interest at time t, and \(n_i\) is the number of observations at risk. When we apply this formula manually we get,

```
time <- c(1,3,5,6, 2, 7, 9, 11)
status <- c(1, 0, 1, 1, 1, 0, 1, 1)
df <- data.frame(time, status)
df <- df[order(time), ] # order on time and events
d <- df$status
n <- length(d):1
H0 <- cumsum(d / n)
H0
```

```
## [1] 0.1250000 0.2678571 0.2678571 0.4678571 0.7178571 0.7178571 1.2178571
## [8] 2.2178571
```

and the same result is provided by the `basehaz`

function after we have fitted an empty Cox model.

```
time <- c(1,3,5,6, 2, 7, 9, 11)
status <- c(1, 0, 1, 1, 1, 0, 1, 1)
df <- data.frame(time, status)
fit_cox <- coxph(Surv(time, status) ~ 1 , data=df, method = "breslow")
basehaz(fit_cox)
```

```
## hazard time
## 1 0.1250000 1
## 2 0.2678571 2
## 3 0.2678571 3
## 4 0.4678571 5
## 5 0.7178571 6
## 6 0.7178571 7
## 7 1.2178571 9
## 8 2.2178571 11
```

**The Breslow estimator**

The Breslow estimator for the baseline cumulative hazard is defined as (Hosmer and Lemeshow, 1999),

\[\widetilde{H_0}(t)=\sum_{t_i\leq t} \frac{d_i}{\sum e^{(X\beta)}}\] where \(d_i\) stands for the number of events of interest at time t and \(X\beta\) for the linear predictor scores (LP) .

When we apply this formula manually we get,

```
df <- data.frame(time, status, sex, age)
fit_cox <-
coxph(Surv(time, status) ~ age + sex, x=TRUE, data=df, method = "breslow")
breslow_est <- function(time, status, X, B){
data <-
data.frame(time, status, X)
data <-
data[order(data$time), ]
t <-
unique(data$time)
k <-
length(t)
h <-
rep(0,k)
for(i in 1:k) {
lp <- (data.matrix(data[,-c(1:2)]) %*% B)[data$time>=t[i]]
risk <- exp(lp)
h[i] <- sum(data$status[data$time==t[i]]) / sum(risk)
}
res <- cumsum(h)
return(res)
}
H0 <- breslow_est(time=df$time, status=df$status, X=fit_cox$x, B=fit_cox$coef)
H0
```

```
## [1] 3.442456e-13 5.942770e-12 5.942770e-12 1.096574e-10 1.897298e-09
## [6] 1.897298e-09 6.646862e-08 9.459174e-07
```

This gives the same result as when the `basehaz`

function is used after a Cox model is fitted and when we set `centered=FALSE`

.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
basehaz(fit_cox, centered = FALSE)
```

```
## hazard time
## 1 3.442456e-13 1
## 2 5.942770e-12 2
## 3 5.942770e-12 3
## 4 1.096574e-10 5
## 5 1.897298e-09 6
## 6 1.897298e-09 7
## 7 6.646862e-08 9
## 8 9.459174e-07 11
```

This is not a surprise, because the cumulative baseline hazard is calculated when the predictors are zero, i.e. when sex=0 and age=0 (more about this below under “When provide the `basehaz`

and `survfit`

functions the same results?”)

As a result of the `basehaz`

function we get a dataframe with two columns, hazard and time. The column that is called hazard is the cumulative baseline hazard function \(H_0(t)\). Let’s take a closer look a the `basehaz`

function.

### What is the difference between centered=TRUE and centered=FALSE in basehaz?

We saw above that the manual version of the Breslow estimator and the `basehaz`

function give the same results when `centered=FALSE`

. When we choose for `centered=TRUE`

the results are different. But of course we can make them equal. Than we have to mean center the linear predictor values.

If we look at the documentation of the `basehaz`

function under centered it says “if TRUE return data from a predicted survival curve at the mean values of the covariates `fit$mean`

, if FALSE return a prediction for all covariates equal to zero”. Centering means that we calculate values against a sample reference value, and that reference value are the values we find under `fit$mean`

.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
fit_cox$means
```

```
## age sex
## 39.625 0.000
```

We see that the “means” are 39.625 and 0 respectively. Here we see that for the binary covariate sex, the reference value 0 is chosen. If we take a look at the R documentation for the `coxph`

function and we look further at the documentation for the `coxph.object`

(This class of objects is returned by the coxph class of functions to represent a fitted proportional hazards model) and look under means we find the following explanation `vector of values used as the reference for each covariate`

and that is exactly what we get.

**Mean centering the linear predictor values**

Now we re-estimate the cumulative baseline hazard but now use the centered version of the linear predictor values, i.e. those calculated using `fit$mean`

and subtracting these from the individual linear predictor values.

in formula form.

\[ LP_{sample} = \sum fit_{mean}*\beta's\] \[ LP_{indiv} = \sum X_i*\beta's\]

\[ LP_{centered} = LP_{indiv} - LP{sample}\]

To implement this we adjust the `breslow_est`

function a little bit and include the LP_sample and LP_indiv objects that calculate the linear predictor values at the sample and individual values.

```
df <- data.frame(time, status, sex, age)
fit_cox <-
coxph(Surv(time, status) ~ age + sex, x=TRUE, data=df, method = "breslow")
breslow_est_adj <- function(time, status, X, B){
data <-
data.frame(time,status,X)
data <-
data[order(data$time), ]
t <-
unique(data$time)
k <-
length(t)
h <-
rep(0,k)
for(i in 1:k) {
LP_sample <- sum(fit_cox$means * coef(fit_cox))
LP_indiv <- c((0.6338168*(data$age)) + (-7.4935992*data$sex) )
lp_centered <- (LP_indiv - LP_sample)[data$time>=t[i]]
risk <- exp(lp_centered)
h[i] <- sum(data$status[data$time==t[i]]) / sum(risk)
}
res <- cumsum(h)
return(res)
}
H0 <- breslow_est_adj(time=df$time, status=df$status, X=fit_cox$x, B=fit_cox$coef)
H0
```

```
## [1] 2.780815e-02 4.800566e-01 4.800566e-01 8.858118e+00 1.532636e+02
## [6] 1.532636e+02 5.369326e+03 7.641106e+04
```

Taking their difference is the same as using `centered=TRUE`

in the `basehaz`

function (the default).

When we apply that you can seethat we get the same result!

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
basehaz(fit_cox, centered = TRUE)
```

```
## hazard time
## 1 2.780808e-02 1
## 2 4.800556e-01 2
## 3 4.800556e-01 3
## 4 8.858100e+00 5
## 5 1.532633e+02 6
## 6 1.532633e+02 7
## 7 5.369320e+03 9
## 8 7.641099e+04 11
```

**Mean centering the covariate values**

In the Cox model centering is done to generate estimates against some kind of sample reference values (to represent some kind of reference population). For this the mean values of covariates are used as we have seen when we centered the linear predictor values. This seems a little bit awkward for categorical and binary variables like “sex” in our example. Sex represents male and female persons and taking the mean would say that we represent a population containing “average” sex persons. Therefore, for binary variables also the reference value of the binary variable is used and that is what is chosen for in the `survival`

package. Mean centering the covariate values is the same as subtracting the mean of the covariate values from each person’s covariate value. When we do that we get the same result as mean centering the linear predictor values.

```
fit_cox <-
coxph(Surv(time, status) ~ age + sex, x=TRUE, data=df, method = "breslow")
X_centered <- sweep(fit_cox$x, 2, fit_cox$means, "-")
H0 <- breslow_est(time=df$time, status=df$status, X=X_centered, B=fit_cox$coef)
H0
```

```
## [1] 2.780808e-02 4.800556e-01 4.800556e-01 8.858100e+00 1.532633e+02
## [6] 1.532633e+02 5.369320e+03 7.641099e+04
```

### When provide the basehaz and survfit functions the same results?

I have already mentioned that the `basehaz`

function is an alias for the `survfit`

function. That means that we can produce the same results for the cumulative baseline hazard function with the `survfit`

function as with the `basehaz`

function. Let’s explore that a little bit.

In the description of the centered term in the basehaz function it says `if TRUE return data from a predicted survival curve at the mean values of the covariates fit$mean, if FALSE return a prediction for all covariates equal to zero.`

This latter means that we have to set all covariates to 0 in the `survfit`

function to get the same results as the `basehaz`

function with `centered=FALSE`

. Let’s do that.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
fit_surv <- survfit(fit_cox, newdata = data.frame(age=0, sex=0))
fit_surv$cumhaz
```

```
## [1] 3.442456e-13 5.942770e-12 5.942770e-12 1.096574e-10 1.897298e-09
## [6] 1.897298e-09 6.646862e-08 9.459174e-07
```

And that is indeed the case!

Now let’s see what we get when we use the default settings in the `survfit`

function.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
fit_surv <- survfit(fit_cox)
fit_surv$cumhaz
```

```
## [1] 2.780808e-02 4.800556e-01 4.800556e-01 8.858100e+00 1.532633e+02
## [6] 1.532633e+02 5.369320e+03 7.641099e+04
```

Now we see that We get the same result as the `basehaz`

function with `centered=TRUE`

. This means that as default the `survfit`

function uses the mean centered variables with mean values that are stored under `fit_cox$means`

.

We can check that when we give the values of the covariates to same value as under `fit_cox$means`

.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
fit_surv <- survfit(fit_cox, newdata = data.frame(age=39.625, sex=0))
fit_surv$cumhaz
```

```
## [1] 2.780808e-02 4.800556e-01 4.800556e-01 8.858100e+00 1.532633e+02
## [6] 1.532633e+02 5.369320e+03 7.641099e+04
```

We see that we indeed get the same results as for `survfit(fit_cox)`

that we previously used!

The survfit function can not only be used to calculate the cumulative baseline hazard function, i.e. the cumulative hazard for a reference group, but also for other groups, depending on the values of the covariates. Eventually it can be used to produce survival curves for all kind of values for the covariates. I will show that later in another post.

### What kind of results do we get with the predict.coxph function?

Now we now more about the cumulative baseline hazard function and the meaning of centering, it is easy to understand what kind of information the `predict.coxph`

function generates.

See let’s take a closer look at this function and also in what way it provides results. In the `predict.coxph`

function we can choose for the options `type=c("lp", "risk", "expected", "terms", "survival")`

. What do they all mean?

**type=“lp”**

As default the predict.coxph function generates the “lp” values or linear predictor values.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
predict(fit_cox) # Same as predict(fit_cox, type="lp")
```

```
## [1] 3.5189684 0.3498842 -2.1853832 -5.9882842 -0.3961355 -5.4666703
## [7] -8.6357545 -11.1710219
```

These are also stored under the `fit_cox`

object and extracted by typing `fit_cox$linear.predictors`

.

`fit_cox$linear.predictors`

```
## [1] 3.5189684 0.3498842 -2.1853832 -5.9882842 -0.3961355 -5.4666703
## [7] -8.6357545 -11.1710219
```

LP scores can also be calculated manually. We know now that the linear predictor of the Cox model is by default mean centered, i.e. calculated relative to the value of the sample mean (which was explained under `**Mean centering the linear predictor values**`

above and also further explained in light of the `predict.coxph`

function below under `**What does it mean that predictions of type "risk" are relative to the sample**`

), so we have to apply that here also to get the correct LP values.

```
df <- data.frame(time, status, sex, age)
fit_cox <-
coxph(Surv(time, status) ~ age + sex, x=TRUE, data=df, method = "breslow")
LP_indiv <- c((0.6338168*(df$age)) + (-7.4935992*df$sex) )
LP_sample <- c((0.6338168*(39.625)) + (-7.4935992*0) )
LP_sample
```

`## [1] 25.11499`

```
LP <- LP_indiv - LP_sample
LP
```

```
## [1] 3.5189677 0.3498837 -2.1853835 -5.9882843 -0.3961355 -5.4666699
## [7] -8.6357539 -11.1710211
```

The same result is obtained by using the reference option in the predict.coxph function

`predict(fit_cox, reference="sample")`

```
## [1] 3.5189684 0.3498842 -2.1853832 -5.9882842 -0.3961355 -5.4666703
## [7] -8.6357545 -11.1710219
```

And this again is the same as mean centering the covariate values and subsequently calculating the linear predictor values.

```
df <- data.frame(time, status, sex, age)
LP <- c((0.6338168*(df$age-39.625)) + (-7.4935992*df$sex) )
LP
```

```
## [1] 3.5189677 0.3498837 -2.1853835 -5.9882843 -0.3961355 -5.4666699
## [7] -8.6357539 -11.1710211
```

Now we know that the `predict.coxph`

function uses as default a mean centered linear predictor.

**What does it mean that predictions of type “risk” are relative to the sample**

If we look at the R documentation of the predict.coxph function we find under details the following explanation: “The Cox model is a relative risk model; predictions of type”linear predictor”, “risk”, and “terms” are all relative to the sample from which they came. By default, the reference value for each of these is the mean covariate within strata.”

Let’s see what this means for the calculation of the LP scores. We first calculate the LP scores by using the mean of the covariates (mean of 39.625 for age and the reference value of 0 for sex):

```
df <- data.frame(time, status, sex, age)
fit_cox <-
coxph(Surv(time, status) ~ age + sex, x=TRUE, data=df, method = "breslow")
LP_sample <- c((0.6338168*(39.625)) + (-7.4935992*0))
LP_sample
```

`## [1] 25.11499`

Than we calculate the LP scores for all individuals in our dataset.

```
df <- data.frame(time, status, sex, age)
fit_cox <-
coxph(Surv(time, status) ~ age + sex, x=TRUE, data=df, method = "breslow")
LP_indiv <- c((0.6338168*(df$age)) + (-7.4935992*df$sex) )
LP_indiv
```

`## [1] 28.63396 25.46487 22.92961 19.12671 24.71886 19.64832 16.47924 13.94397`

Than we subtract the LP score at the mean of the sample (mean of the covariates) from the LP scores of the persons in our sample and exponentiate these values.

`exp(LP_indiv - LP_sample)`

```
## [1] 3.374957e+01 1.418903e+00 1.124346e-01 2.507963e-03 6.729155e-01
## [6] 4.225279e-03 1.776396e-04 1.407626e-05
```

We than get the same results as the “risk” results, but now see more clear that the LP and risk scores are calculated against the mean values of the covariates in the sample.

**type=“risk**

When we use this option hazard ratio’s are calculated with as the reference value the LP scores at the means of the covariates.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
predict(fit_cox, type="risk")
```

```
## [1] 3.374960e+01 1.418903e+00 1.124346e-01 2.507963e-03 6.729155e-01
## [6] 4.225278e-03 1.776395e-04 1.407625e-05
```

The function will therefore produce the same results when we exponentiate the LP values.

```
df <- data.frame(time, status, sex, age)
fit_cox <-
coxph(Surv(time, status) ~ age + sex, x=TRUE, data=df, method = "breslow")
LP_indiv <- c((0.6338168*(df$age)) + (-7.4935992*df$sex) )
LP_sample <- c((0.6338168*(39.625)) + (-7.4935992*0) )
LP_sample
```

`## [1] 25.11499`

```
LP <- LP_indiv - LP_sample
risk <- exp(LP)
risk
```

```
## [1] 3.374957e+01 1.418903e+00 1.124346e-01 2.507963e-03 6.729155e-01
## [6] 4.225279e-03 1.776396e-04 1.407626e-05
```

**type=“expected”**

When we use this option we get the following results.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
predict(fit_cox, type="expected")
```

```
## 1 2 3 4 5 6 7 8
## 0.9385114 0.6811525 0.9959573 0.3843788 0.3230369 0.6475801 0.9538031 1.0755799
```

In the R documentation is described that: “Predictions of type”expected” incorporate the baseline hazard and are thus absolute instead of relative”.

For these predictions we need the cumulative baseline hazard function, which we can easily obtain by the `basehaz`

function. To get the same result as the `predict.coxph`

function we first use the default settings of the `basehaz`

function, i.e. `centered=TRUE`

and than multiply these values with the (default) (mean centered) linear predictor values.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
H0 <- basehaz(fit_cox)
LP <- predict(fit_cox, type="lp")
H0[, 1]*exp(LP)
```

```
## [1] 0.93851138 0.68115250 0.05397488 0.02221579 103.13326837
## [6] 0.64758014 0.95380312 1.07557986
```

**type=“terms”**

When we use this option we get the following values. These are the linear predictor values for each covariate (term) separately.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
predict(fit_cox, type="terms")
```

```
## age sex
## 1 11.0125677 -7.493599
## 2 7.8434834 -7.493599
## 3 5.3082161 -7.493599
## 4 1.5053150 -7.493599
## 5 -0.3961355 0.000000
## 6 -5.4666703 0.000000
## 7 -8.6357545 0.000000
## 8 -11.1710219 0.000000
## attr(,"constant")
## [1] 25.11499
```

The information under `attr(,"constant")`

is the linear predictor value of the reference population. As default value `type="terms"`

provides the linear predictor values of all terms (covariates) in the model. With `terms`

it is possible to get the linear predictor values for each separate covariate in the model.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
predict(fit_cox, type="terms", terms = "age") # same as terms = 1
```

```
## age
## 1 11.0125677
## 2 7.8434834
## 3 5.3082161
## 4 1.5053150
## 5 -0.3961355
## 6 -5.4666703
## 7 -8.6357545
## 8 -11.1710219
## attr(,"constant")
## [1] 25.11499
```

These linear predictor values are calculated as,

```
LP_age <- 0.6338168*df$age - (0.6338168*39.625)
LP_age
```

```
## [1] 11.0125669 7.8434829 5.3082157 1.5053149 -0.3961355 -5.4666699
## [7] -8.6357539 -11.1710211
```

and for the covariate sex,

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
predict(fit_cox, type="terms", terms = "sex") # same as terms = 2
```

```
## sex
## 1 -7.493599
## 2 -7.493599
## 3 -7.493599
## 4 -7.493599
## 5 0.000000
## 6 0.000000
## 7 0.000000
## 8 0.000000
## attr(,"constant")
## [1] 25.11499
```

These linear predictor values are calculated as.

```
LP_age <- -7.4935992*df$sex - (-7.4935992*0)
LP_age
```

```
## [1] -7.493599 -7.493599 -7.493599 -7.493599 0.000000 0.000000 0.000000
## [8] 0.000000
```

So, each covariate uses it’s own reference value.

**type=“survival”**

This option is used to calculate survival probabilities.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
predict(fit_cox, type="survival")
```

```
## 1 2 3 4 5 6 7 8
## 0.3912098 0.5060335 0.3693697 0.6808734 0.7239472 0.5233106 0.3852730 0.3410999
```

Using the cumulative hazard to calculate survival probabilities we can make use of the following formula,

\[S(t) = exp(-H(t)\]

We have seen under `**type="expected"**`

how to calculate the cumulative hazard values. We can make use of the same calculations and include these in the formula for the survival probability.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
H0 <- basehaz(fit_cox)
LP <- predict(fit_cox, type="lp")
exp(-H0[, 1]*exp(LP))
```

```
## [1] 3.912098e-01 5.060335e-01 9.474559e-01 9.780292e-01 1.621028e-45
## [6] 5.233106e-01 3.852730e-01 3.410999e-01
```

This is of course the same as,

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
exp(-predict(fit_cox, type="expected"))
```

```
## 1 2 3 4 5 6 7 8
## 0.3912098 0.5060335 0.3693697 0.6808734 0.7239472 0.5233106 0.3852730 0.3410999
```

### Where can we use the termplot function for?

If we take a look at the R documentation of the `termplot`

function we can read that with this function it is possible to plot regression terms. We can further read that we can extract the information that is plotted when we set `plot=FALSE`

.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
termplot(fit_cox, plot=FALSE)
```

```
## $age
## x y
## 1 22 -11.1710219
## 2 26 -8.6357545
## 3 31 -5.4666703
## 4 39 -0.3961355
## 5 42 1.5053150
## 6 48 5.3082161
## 7 52 7.8434834
## 8 57 11.0125677
##
## $sex
## x y
## 1 0 0.000000
## 2 1 -7.493599
##
## attr(,"constant")
## [1] 25.11499
```

We see that we get the same information as that is provided by the `predict.coxph`

function and than option `type="terms"`

. The values that are provided are the linear predictor values related to each covariate. For continuous predictors the ordered unique values are provided and for a factor the values related to each category (level).

There is also the possibility to choose for specific covariate (term) values as was possible with the `predict.coxph`

function.

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
termplot(fit_cox, plot=FALSE, terms="age")
```

```
## $age
## x y
## 1 22 -11.1710219
## 2 26 -8.6357545
## 3 31 -5.4666703
## 4 39 -0.3961355
## 5 42 1.5053150
## 6 48 5.3082161
## 7 52 7.8434834
## 8 57 11.0125677
##
## attr(,"constant")
## [1] 25.11499
```

To make a plot of the linear predictor values we can set `plot=TRUE`

(default).

```
df <- data.frame(time, status, sex, age)
fit_cox <- coxph(Surv(time, status) ~ age + sex, data=df, method = "breslow")
termplot(fit_cox, plot=TRUE, terms="age")
```

This is especially informative if the model would include non-linear or spline terms.

### From hazard to survival

What we all can do with these functions in terms of survival probabilities and survival curves will be explained in another post (probably published in February 2023).

**References**

Hosmer, D.W. and Lemeshow, S. (1999) Applied Survival Analysis, Regression Modeling of Time to Event Data. John Wiley and Sons, New York.

Klein, J.P. and Moeschberger, S. (2003) Survival Analysis: Techniques for Censored and Truncated Data. 2nd edition. Springer, Berlin.

Moore, D.F. (2016) Applied Survival Analysis Using R. Use R. Springer, Berlin.